Large image collections are increasingly significant for media, scientific and other platforms. A case study of Facebook’s predictive modelling of satellite images of human settlement exemplifies how images en masse affect the platform, supporting its expansion and its infrastructuralisation. Under platform conditions, the operation of such models entails shifts in the position of human observers, rearrangements of image collections, reconfiguration of platform architecture and a significant general shift in the referential functioning of images. The treatment of images in a predictive model – a deep neural network – embeds an indexical field, a field that allows the platform to not only generate referential statements about the world, but position itself in the world. Under platform conditions, image collections function less as archives or records, or as surveillant gaze and more as densely-woven indexical fields that orient and locate the platform’s operations in everyday experience. In describing the transformation of image collections, the paper points to important changes in how platform media embed themselves in the world.
A fragment of a larger world is compressed into a piece …, impressing its surface with color and light without taking the position of a viewer external to it into account. No scale or human measure is assumed (Alpers 1983, 37)
Facebook’s Connectivity Lab announced in 2016 they had developed a machine learning model – ‘the High Resolution Settlement Layer’ (HRSL) – that identifies individual buildings ‘across the globe’ (Gros 2016; Metz 2016). The model ‘learned’ to see individual buildings from a collection of several billion satellite images.1 Recognizing and locating individual buildings, the Facebook project team counted ‘human artifacts’ down to five meter resolution:
We invoked Facebook’s image-recognition engine — based on a deep convolutional neural network that provides a fixed dimensional feature embedding for all images — and found that, with minor modifications, we could use the engine trained on normal photos to efficiently detect whether a satellite image contained a building. (Gros 2016)
A survey of human settlement is useful to platforms like Facebook: combining building locations with census data, a social network platform can plan the build-out of connectivity for people not yet connected to the internet. Network infrastructure can be placed where it is most likely to contribute to the growth of the platform’s user base. Satellites, drones, microwave wireless links and other Facebook infrastructure, including data centres, can be positioned accordingly. The model supports the ‘infrastructuralisation’ of the platform, as a recent sociologists of platforms term it (Plantin et al. 2016), the process whereby platform media insinuate themselves into everyday life as mundane affordances. Not coincidentally, as so often seems to be the case, Facebook aimed to demonstrate the platform’s epistemic capacity to ‘know’ the world better. As Noortje Marres writes, ‘forms of knowledge have become the focal point of controversy in digital societies’ (Marres 2017, 184).
Whether or not Facebook’s model of human settlement, purportedly the most detailed such model ever constructed, garners the platform new users is not so important for the purposes of this paper. The relevance of the example centres on how machine learning ‘embeds’ – a key term in this paper – large image collections in contemporary platforms, whether they are social networks, search engines, scientific knowledge infrastructures or government surveillance systems, autonomous vehicles or smart phones. The image collection directly feeding into HRSL comes from Digital Globe’s satellites (DigitalGlobe 2016; Lab and International Earth Science Information NetworkCIESIN 2017), which, like a dozen or so other earth imaging systems, photograph the earth at varying degrees of spatial resolution. But HRSL also relies indirectly on social network media image collections. The so-called ‘normal photos’ in Facebook’s announcement reportedly derive from the Instagram photo-sharing platform, a platform where faces, places, and specific things are labelled with hashtags by its many users (Shankland 2018). This mundane photo-sharing image collection with its tags provides a foothold for HRSL‘s identification of buildings in the DigitalGlobe satellite images: ’with minor modifications, we could use the engine trained on normal photos to efficiently detect whether a satellite image contained a building.’
Something quite complicated has happened around image collections. Facebook started life as an online archive of student photographs. The making, circulation and viewing of photographic images are richly differentiated yet ordinary cultural practices on platforms (Van Dijck 2013; Miller and Jolynna 2017). Visual and media studies have begun to account for the effects of platforms on images. Scholars have studied the effect of platforms and networks on image compression techniques (Mackenzie 2008; Cubitt 2014), file formats (Sterne 2012) and camera-enabled devices (Kember 2014; Mackenzie and Munster 2019). The challenges of finding, sorting, classifying, labelling and ranking images have led to far-ranging changes in platform architecture, and occasioned globally distributed data centres heavily tasked with rapid image retrieval.
Platform image architectures now go, however, well beyond their characterisation in terms of a generic ‘database logic’ aiming at retrieval (Manovich 2001).2 Archival retrieval operations have undoubtedly paved the way for the predictive observing and re-weaving of image referentiality. The state of affairs the HRSL model seeks to address is typical. It concerns vast accumulations of images whose value consists in aggregation not in individual experience. We can of course encounter, interpret or evaluate images singly. When we look at the image in Figure 1, we can say with some confidence ‘this is a house’, ‘this is a field’, or ‘this is a road.’ Whatever manipulations that image may undergo, even in presenting it here, our phenomenological investment in the relation between the photographic image and a pre-existing reality remains. As Tom Gunning emphasises, the referential effect of photographic images depends on some mixture of ‘perceptual richness and nearly infinite detail’ (Gunning 2004, 45). The referential significance of images collected under platform conditions is something different.